Search Result

Select

Sensitive information detection method based on attention mechanism-based ELMo

Cheng HUANG, Qianrui ZHAO

Journal of Computer Applications 2022, 42 (7): 2009-2014. DOI: 10.11772/j.issn.1001-9081.2021050877

Abstract （736）

HTML （44）

PDF （973KB）（296）

Save

In order to solve the problems of low accuracy and poor generalization of the traditional sensitive information detection methods such as keyword character matching-based method and phrase-level sentiment analysis-based method， a sensitive information detection method based on Attention mechanism-based Embedding from Language Model （A-ELMo） was proposed. Firstly， the quick matched of trie tree was performed to reduce the comparison of useless words significantly， thereby improving the query efficiency greatly. Secondly， an Embedding from Language Model （ELMo） was constructed for context analysis， and the dynamic word vectors were used to fully represent the context characteristics to achieve high scalability. Finally， the attention mechanism was combined to enhance the identification ability of the model for sensitive features， and further improve the detection rate of sensitive information. Experiments were carried out on real datasets composed of multiple network data sources. The results show that the accuracy of the proposed sensitive information detection method is improved by 13.3 percentage points compared with that of the phrase-level sentiment analysis-based method， and the accuracy of the proposed method is improved by 43.5 percentage points compared with that of the keyword matching-based method， verifying that the proposed method has advantages in terms of enhancing identification ability of sensitive features and improving the detection rate of sensitive information.

Table and Figures | Reference | Related Articles | Metrics

Select

Text segmentation model based on graph convolutional network

Yuqi DU, Jin ZHENG, Yang WANG, Cheng HUANG, Ping LI

Journal of Computer Applications 2022, 42 (12): 3692-3699. DOI: 10.11772/j.issn.1001-9081.2021101768

Abstract （443）

HTML （24）

PDF （2746KB）（212）

Save

The main task of text segmentation is to divide the text into several relatively independent text blocks according to the topic relevance. Aiming at the shortcomings of the existing text segmentation models in extracting fine-grained features such as text paragraph structural information， semantic correlation and context interaction， a text segmentation model TS-GCN （Text Segmentation-Graph Convolutional Network） based on Graph Convolutional Network （GCN） was proposed. Firstly， a text graph based on the structural information and semantic logic of text paragraphs was constructed. Then， the semantic similarity attention was introduced to capture the fine-grained correlation between text paragraph nodes， and the information transmission between high-order neighborhoods of text paragraph nodes was realized with the help of GCN， so that the model ability of multi-granularity extraction of text paragraph topic feature representations was enhanced. The proposed model was compared with the representative model CATS （Coherence-Aware Text Segmentation）， and its basic model TLT-TS （Two-Level Transformer model for Text Segmentation）， which were commonly used as benchmarks for text segmentation task. Experimental results show that TS-GCN’s evaluation index P_k is 0.08 percentage points lower than that of TLT-TS without any auxiliary module on Wikicities dataset. And the proposed model has the P_k value decreased by 0.38 percentage points and 2.30 percentage points respectively on Wikielements dataset compared with CATS and TLT-TS. It can be seen that TS-GCN achieves good segmentation effect.

Table and Figures | Reference | Related Articles | Metrics

Select

Reinforced automatic summarization model based on advantage actor-critic algorithm

DU Xixi, CHENG Hua, FANG Yiquan

Journal of Computer Applications 2021, 41 (3): 699-705. DOI: 10.11772/j.issn.1001-9081.2020060837

Abstract （377）

PDF （975KB）（845）

Save

The extractive summary model is relatively redundant and the abstractive summary model often loses key information and has inaccurate summary and repeated generated content in long text automatic summarization task. In order to solve these problems, a Reinforced Automatic Summarization model based on Advantage Actor-Critic algorithm (A2C-RLAS) for long text was proposed. Firstly, the key sentences of the original text were extracted by the extractor based on the hybrid neural network of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Then, the key sentences were refined by the rewriter based on the copy mechanism and the attention mechanism. Finally, the Advantage Actor-Critic (A2C) algorithm in reinforcement learning was used to train the entire network, and the semantic similarity between the rewritten summary and the reference summary (BERTScore (Evaluating Text Generation with Bidirectional Encoder Representations from Transformers) value) was used as a reward to guide the extraction process, so as to improve the quality of sentences extracted by the extractor. The experimental results on CNN/Daily Mail dataset show that, compared with models such as Reinforcement Learning-based Extractive Summarization (Refresh) model, a Recurrent Neural Network based sequence model for extractive summarization (SummaRuNNer) and Distributional Semantics Reward (DSR) model, the A2C-RLAS has the final summary with content more accurate, language more fluent and redundant content effectively reduced, at the same time, A2C-RLAS has both the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BERTScore indicators improved. Compared to the Refresh model and the SummaRuNNer model, the ROUGE-L value of the A2C-RLAS model is increased by 6.3% and 10.2% respectively; compared with the DSR model, the F1 value of the A2C-RLAS model is increased by 30.5%.

Reference | Related Articles | Metrics

Select

Target detection of carrier-based aircraft based on deep convolutional neural network

ZHU Xingdong, TIAN Shaobing, HUANG Kui, FAN Jiali, WANG Zheng, CHENG Huacheng

Journal of Computer Applications 2020, 40 (5): 1529-1533. DOI: 10.11772/j.issn.1001-9081.2019091694

Abstract （384）

PDF （823KB）（376）

Save

The carrier-based aircrafts on the carrier deck are dense and occluded, so that the carrier-based aircraft targets are difficult to detect, and the detection effect is easily affected by the lighting condition and target size. Therefore, an improved Faster R-CNN (Faster Region with Convolutional Neural Network) carrier-based aircraft target detection method was proposed. In this method, a loss function with a repulsion loss strategy was designed, and combined with multi-scale training, pictures collected under laboratory condition were used to train and test the deep convolutional neural network. Test experiments show that compared with the original Faster R-CNN detection model, the improved model has a better detection effect on occluded aircraft targets, the recall increased by 7 percentage points, and the precision increased by 6 percentage points. The experimental results show that the proposed improved method can automatically and comprehensively extract the characteristics of carrier-based aircraft targets, solve the detection problem of occluded carrier-based aircraft targets, has the detection accuracy and speed which can meet the actual needs, and has strong adaptability and high robustness under different lighting conditions and target sizes.

Reference | Related Articles | Metrics

Select

One-site multi-table and cross multi-table frequent item sets mining with privacy preserving

LIN Rui ZHONG Cheng HUA Pei

Journal of Computer Applications 2013, 33 (12): 3437-3440.

Abstract （526）

PDF （666KB）（330）

Save

To achieve the goal that personal and original information is not disclosed to each other when several parties cooperatively mine several data tables at different computational sites, based on secure triple-party protocol, a triple-site cross multi-table frequent item sets mining algorithm with privacy preserving was proposed in distributed environment with multiple tables at each site. The proposed algorithm disturbed data by generating random numbers, mined frequent item sets of inter-site in parallel, and linked the data with equal-value by common link attribution of the tables among the sites and applied secure protocol to compute the global support of inter-site cross-table frequent item sets. The experimental results show that the proposed algorithm is efficient, and it can not only mine the cross multi-table frequent item sets, but also preserve the private data at each site.